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Collaborative recommendation is an information-filtering tech- 
nique that attempts to present information items that are likely of 
interest to an Internet user. Traditionally, collaborative systems deal 
with situations with two types of variables, users and items. In its 
most common form, the problem is framed as trying to estimate rat- 
ings for items that have not yet been consumed by a user. Despite 
wide-ranging literature, little is known about the statistical proper- 
ties of recommendation systems. In fact, no clear probabilistic model 
even exists which would allow us to precisely describe the mathemat- 
ical forces driving collaborative filtering. To provide an initial contri- 
bution to this, we propose to set out a general sequential stochastic 
model for collaborative recommendation. We offer an in-depth analy- 
sis of the so-called cosine-type nearest neighbor collaborative method, 
which is one of the most widely used algorithms in collaborative fil- 
tering, and analyze its asymptotic performance as the number of 
users grows. We establish consistency of the procedure under mild 
assumptions on the model. Rates of convergence and examples are 
also provided. 

1. Introduction. Collaborative recommendation is a Web information- 
filtering technique that typically gathers information about your personal 
interests and compares your profile to other users with similar tastes. The 
goal of this system is to give personalized recommendations, whether this 
be movies you might enjoy, books you should read or the next restaurant 
you should go to. 

There has been much work done in this area over the past decade since 
the appearance of the first papers on the subject in the mid-90s [11, 13, 16]. 
Stimulated by an abundance of practical applications, most of the research 
activity to date has focused on elaborating various heuristics and practical 
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Table 1 

A (subset of a) ratings matrix for a movie recommendation system. Ratings are specified 
on a scale from 1 to 10, and "NA " means that the user has not rated the corresponding 

film 





Armageddon 


Platoon 


Rambo 


Rio Bravo 


Star wars 


Titanic 


Jim 


NA 


6 


7 


8 


9 


NA 


James 


3 


NA 


10 


NA 


5 


7 


Steve 


7 


NA 


1 


NA 


6 


NA 


Mary 


NA 


7 


1 


NA 


5 


6 


John 


NA 


7 


NA 


NA 


3 


1 


Lucy 


3 


10 


2 


7 


NA 


4 


Stan 


NA 


7 


NA 


NA 


1 


NA 


Johanna 


4 


5 


NA 


8 


3 


9 


Bob 


NA 


3 


3 


4 


5 


? 



methods [4, 10, 14] so as to provide personalized recommendations and help 
Web users deal with information overload. Examples of such applications 
include recommending books, people, restaurants, movies, CDs and news. 
Websites such as amazon.com, match.com, movielens.org and allmusic.com 
already have recommendation systems in operation. We refer the reader to 
the surveys by [3] and [2] for a broader picture of the field, an overview of 
results and many related references. 

Traditionally, collaborative systems deal with situations with two types of 
variables, users and items. In its most common form, the problem is framed 
as trying to estimate ratings for items that have not yet been consumed by 
a user. The recommendation process typically starts by asking users a series 
of questions about items they liked or did not like. For example, in a movie 
recommendation system, users initially rate some subset of films they have 
already seen. Personal ratings are then collected in a matrix where each row 
represents a user, each column an item, and entries in the matrix represent 
a given user's rating of a given item. An example is presented in Table 1 
where ratings are specified on a scale from 1 to 10, and "NA" means that 
the user has not rated the corresponding film. 

Based on this prior information, the recommendation engine must be able 
to automatically furnish ratings of as-yet unrated items and then suggest 
appropriate recommendations based on these predictions. To do this, a num- 
ber of practical methods have been proposed, including machine learning- 
oriented techniques [1], statistical approaches [15] and numerous other ad 
hoc rules [2]. The collaborative filtering issue may be viewed as a special 
instance of the problem of inferring the many missing entries of a data ma- 
trix. This field, which has very recently emerged, is known as the matrix 
completion problem and comes up in many areas of science and engineering. 
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including collaborative filtering, machine learning, control, remote sensing 
and computer vision. We will not pursue this promising approach, and refer 
the reader to [5] and [6] who survey the literature on matrix completion. 
These authors show in particular that under suitable conditions, one can 
recover an unknown low rank matrix from a nearly minimal set of entries 
by solving a simple convex optimization problem. 

In most of the approaches, the crux is to identify users whose tastes/rat- 
ings are "similar" to the user we would like to advise. The similarity measure 
assessing proximity between users may vary depending on the type of appli- 
cation but is typically based on a correlation or cosine-type approach [15]. 

Despite wide-ranging literature, very little is known about the statistical 
properties of recommendation systems. In fact, no clear probabilistic model 
even exists allowing us to precisely describe the mathematical forces driving 
collaborative filtering. To provide an initial contribution to this, we propose 
in the present paper to set out a general stochastic model for collaborative 
recommendation and analyze its asymptotic performance as the number of 
users grows. 

The document is organized as follows. In Section 2, we provide a sequential 
stochastic model for collaborative recommendation and describe the statis- 
tical problem. In the model we analyze, unrated items are estimated by 
averaging ratings of users who are "similar" to the user we would like to ad- 
vise. The similarity is assessed by a cosine-type measure, and unrated items 
are estimated using a /c„-nearest neighbor-type regression estimate which is 
indeed one of the most widely used procedures in collaborative filtering. It 
turns out that the choice of the cosine proximity as a similarity measure 
imposes constraints on the model which are discussed in Section 3. Under 
mild assumptions, consistency of the estimation procedure is established in 
Section 4 whereas rates of convergence are discussed in Section 5. Illustrative 
examples are given throughout the document, and proofs of some technical 
results are postponed to Section 6. 

2. A model for collaborative recommendation. 

2.1. Ratings matrix and new users. Suppose that there are d+1 {d>l) 
possible items, n users in the ratings matrix (i.e., the database) and that 
users' ratings take values in the set ({0} U [1, s])'^^"'^. Here, s is a real number 
greater than 1 corresponding to the maximal rating, and, by convention, the 
symbol means that the user has not rated the item (same as "NA"). Thus 
the ratings matrix has n rows, d+1 columns and entries from {0} U [l,s]. 
For example, n = 8, d = 5 and s = 10 in Table 1 which will be our toy 
example throughout this section. Then a new user, Bob, reveals some of his 
preferences for the first time, rating some of the first d items but not the 
{d + l)th (the movie Titanic in Table 1). We want to design a strategy to 
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predict Bob's rating of Titanic using: (i) Bob's ratings of some (or all) of 
the other d movies and (ii) the ratings matrix. This is illustrated in Table 
1, where Bob has rated 4 out of the 5 movies. 

The first step in our approach is to model the preferences of the new 
user, Bob, by a random vector (X,y) of size d+1 taking values in the set 
[1, X [1) s] . Within this framework, the random variable X = [Xi , . . . , X^) 
represents Bob's preferences pertaining to the first d movies whereas Y, the 
(unobserved) variable of interest, refers to the movie Titanic. In fact, as Bob 
does not necessarily reveals all his preferences at once, we do not observe 
the variable X, but instead some "masked" version of it denoted hereafter 
by X*. The random variable X* = {X^, . . . , X^) is naturally defined by 

^ 1 0, otherwise, 

where M stands for some nonempty random subset of {!,..., d} indexing 
the movies which have been rated by Bob. Observe that the random variable 
X* takes values in ({0} U [l,s])'^ and that ||X*|| > 1 where || • || denotes the 
usual Euclidean norm on W^. In the example of Table 1, M = {2, 3, 4, 5} and 
(the realization of) X* is (0,3,3,4,5). 

We follow the same approach to model preferences of users already in 
the database (Jim, James, Steve, Mary, etc. in Table 1), who will therefore 
be represented by a sequence of independent [1,5]*^ x [l,s]-valued random 
pairs (Xijli), . . . , (X„,y„) from the distribution (K,Y). A first idea for 
dealing with potential nonresponses of a user i in the ratings matrix (i = 
1, . . . ,n) is to consider in place of Xj = {Xn, . . . ,Xjrf), its masked version 
Xj = (Xji , . . . , Xid) defined by 

(2.1) ^. = 1^' ^f^;^^^^^^' 

[ U, otherwise, 

where each Mi is the random subset of { 1 , . . . , d} indexing the movies which 
have been rated by user i. In other words, we only keep in Xj items corated 
by both user i and the new user — items which have not been rated by X 
and Xj are declared noninformative and simply thrown away. 

However, this model, which is static in nature, does not allow to take into 
account the fact that, as time goes by, each user in the database may reveal 
more and more preferences. This will, for instance, typically be the case in 
the movie recommendation system of Table 1 where regular customers will 
update their ratings each time they have seen a new movie. Consequently, 
model (2.1) is not fully satisfying and must therefore be slightly modified to 
better capture the sequential evolution of ratings. 
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Table 2 

A sequential model for preference updating 





Time 1 


Time 2 


Time i 


Time n 


User 1 
User 2 


Ml 


Mf 
Ml 


Ml 

■ ■ Mr' ■ 


Mr 
M2"-' 


User i 






Ml 


M"+^~' 


User n 











2.2. ^ sequential model. A possible dynamical approach for collabora- 
tive recommendation is based on the following protocol: users enter the 
database one after the other and update their list of ratings sequentially in 
time. More precisely, we suppose that at each time i = 1, 2, . . . , a new user 
enters the process and reveals his preferences for the first time while the i — 1 
previous users are allowed to rate new items. Thus, at time 1, there is only 
one user in the database (Jim in Table 1), and the (nonempty) subset of 
items he decides to rate is modeled by a random variable taking values 
in V*{{1, . . . ,d}), the set of nonempty subsets of {!,..., d}. At time 2, a 
new user (James) enters the game and reveals his preferences according to 
a V*{{1, . . . ,d})-valued random variable , with the same distribution as 
M^. At the same time, Jim (user 1) may update his list of preferences, mod- 
eled by a random variable satisfying C . The latter requirement 
just means that the user is allowed to rate new items but not to remove 
his past ratings. At time 3, a new user (Steve) rates items according to a 
random variable M3 distributed as M^, while user 2 updates his preferences 
according to M| (distributed as Mf) and user 1 updates his own according 
to Mf, and so on. This sequential mechanism is summarized in Table 2. 

By repeating this procedure, we end up at time n with an upper triangular 
array {M-)i<i<n,i<j<n+i-i of random variables. A row in this array consists 
of a collection of random variables for a given value of i, taking values 
in V*{{1, . . . ,d}) and satisfying the constraint C Mf^^ . For a fixed i, 
the sequence C Mf C • • • describes the (random) way user i sequentially 
reveals his preferences over time. Observe that the later inclusions are not 
necessarily strict, so that a single user is not forced to rate one more item 
at every single step. 

Throughout the paper, we will assume that, for each i, the distribution 
of the sequence of random variables (M")„>i is independent of and is 
therefore distributed as a generic random sequence denoted (M"')„>i, sat- 
isfying M"^ 7^ and M" C M""*"^ for all n > 1. For the sake of coherence. 
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we assume that and M [see (2.1)] have the same distribution; that is, 
the new abstract user X* may be regarded as a user entering the database 
for the first time. We will also suppose that there exists a positive random 
integer no such that M^o = {1, . . . , d}, and, consequently, = {1, . . . , c?} 
for all n > no. This requirement means that each user rates all d items af- 
ter a (random) period of time. Last, we will assume that the pairs (Xj,l^), 
i = 1, . . . ,n, the sequences (Mf )„>i, (M2')„>i, . . . and the random variable 
M are mutually independent. We note that this implies that the users' rat- 
ings are independent. 

With this sequential point of view, improving on (2.1), we let the masked 
version X^^"^ = {x\^^ , X^"^^) of Xj be defined as 

^-^ \ 0, otherwise. 

Again, it is worth pointing out that, in the definition of X-"\ items which 
have not been corated by both X and Xj are deleted. This implies in par- 
ticular that X-"'' may be equal to 0, the d-dimensional null vector (whereas 
||X*|| > 1 by construction). 

Finally, in order to deal with possible nonanswers of database users re- 
garding the variable of interest ( Titanic in our movie example) , we introduce 
{T^n)n>i, a sequence of random variables taking values in 7^*({1, . . . , n}), 
such that TZn is independent of M and the sequences (M")„>i, and satis- 
fying TZn C TZn+i for all n > 1. In this formalism, TZn represents the subset, 
which is assumed to be nonempty, of users who have already provided infor- 
mation about Titanic at time n. For example, in Table 1, only James, Mary, 
John, Lucy and Johanna have rated Titanic and therefore (the realization 
of) Tin is {2,4,5,6,8}. 

2.3. The statistical problem. To summarize the model so far, we have 
at hand at time n a sample of random pairs (X^"-*, Yi ),..., (Xn"\ l^n) and 
our mission is to predict the score y of a new user represented by X*. The 
variables x|^"\ . . . ,Xn"^ model the database users' revealed preferences with 
respect to the first d items. They take values in ({0} U [Ijs])*^, where a at 

(n) 

coordinate j of X^ means that the jth product has not been corated by 
both user i and the new user. The variable X* takes values in ({0} U [1, s])"^ 
and satisfies ||X*|| > 1. The random variables Yi,. . . ,Yn model users' ratings 
of the product of interest. They take values in [l,s] and, at time n, we only 
see a nonempty (random) subset of {Yi, . . . ,Yn}, indexed by TZn- 

The statistical problem with which we are faced is to estimate the re- 
gression function ??(x*) = E[y|X* = x*]. For this goal, we may use the 
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database observations (X^"'^ Yi), . . . , (xi"\y„) in order to construct an es- 
timate r/„(x*) of 7]{x*). The approach we explore in this paper is a cosine- 
based A:„-nearest neighbor regression method, one of the most widely used 
algorithms in collaborative filtering (see, e.g., [15]). 

Given x* G ({0} U [l,s])'^ - and the sample (xS^^Fi), . . . , (xi"\y„), 
the idea of the cosine-type /c^-nearest neighbor (NN) regression method is 
to estimate ??(x*) by a local averaging over those Yi for which: (i) X^"^ is 
"close" to X*, and (ii) i G TZn, that is, we effectively "see" the rating Yi. For 

this, we scan through the kn neighbors of x* among the database users X^"'' 
for which i G TZn and estimate r/(x*) by averaging the corresponding Y^. 
The closeness between users is assessed by a cosine-type similarity, defined 
for x= (xi,...,Xrf) and x' = {x[, . . . ,x'^) in ({0} U [1, s])"' by 

5(x,x')^ S.e^^^-^^- 



where J' = {j £ {1, . . . , d} : xj and x'j ^ 0}, and, by convention, 5(x, x') = 
if i7 = 0. To understand the rationale behind this proximity measure, just 
note that if J' = {l,...,d} then S'(x,x') coincides with cos(x,x'); that is, 
two users are "close" with respect to S if their ratings are more or less 
proportional. However, the similarity S, which will be used to measure the 
closeness between X* (the new user) and X^-"'' (a database user) ignores 

possible nonanswers in X* or X-"\ and is therefore more adapted to the 
recommendation setting. For example, in Table 1, 

^(Bob, Jim) = ^((0, 3, 3, 4, 5), (0, 6, 7, 8, 9)) 

= 5((3,3,4,5),(6,7,8,9))«0.99, 

whereas 

^•(Bob, Lucy) = ^((0, 3, 3, 4, 5), (3, 10, 2, 7, 0)) 
= 5((3,3,4),(10,2,7)) f«0.89. 

Next, fix X* G ({0} U [l,s])'^ - 0, and suppose for simplification that M C 

+^-* for each i G 
(Xfi,...,X*) where 



j^n+i t £q^, gg^g]^ ^ g -^^^ jj^ ^j^jg case, it is easy to see that X-"^ = X^ 



Xij, ifjGM, 



I 0, otherwise. 
Besides, Yi>l, 

(2.2) 5(x*,X*) = cos(x*,X*)>0 
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and an elementary calculation shows that the positive real number y which 
maximizes the similarity between (x*,y) and (X*,l^), that is, 

s((x^y),(x^y.))- T.,eM-^xt^ + yy^ 



is given by 

||x*|| 

This suggests the following regression estimate 7/„(x*) of f?(x*): 
(2.3) ^„(x-) = ||x*|| J;T^„,(x^)^, 



where the integer fc„ satisfies 1 <kn<n and 



j 1/kn, if X* is among the /c„-MS of x* in {X*,i G Tin}, 
\ 0, otherwise. 



In the above definition, the acronym "MS" (for most similar) means that we 
are searching for the kn "closest" points of x* within the set {X*,i e TZn} 
using the similarity S — or, equivalently here, using the cosine proximity 
[by identity (2.2)]. Note that the cosine term has been removed since it 
has asymptotically no influence on the estimate, as can be seen by a slight 
adaptation of the arguments of the proof of Lemma 6.1, Chapter 6 in [9]. The 
estimate r7„(x*) is called the cosine-type kn-NN regression estimate in the 
collaborative filtering literature. Now, recalling that definition (2.3) makes 
sense only when M C M"+^"* for each i G TZn (that is, X^^"'' = X*), the next 
step is to extend the definition of r]n{x*) to the general case. In view of (2.3), 
the most natural approach is to simply put 

(2.4) r/„(x'^) = ||x*|| V iy™(x-^^ ^ 



i&Tl„ ll^i II 

where 

\Y^.(^x*) = I 1/^"' if -^1"'' is among the /c„-MS of x* in {x|"\i G TZn}, 
i 0, otherwise. 

The acronym "MS" in the weight Wni{x*) means that the kn closest database 
points of X* are computed according to the similarity 

S{^\^1 =Pi 'Six\X\ >) withp^^ > = 
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(here and throughout, notation \A\ means the cardinahty of the finite set 
A). The factor p^"^ in front of S* is a penalty term which, roughly, avoids over 
promotion of the last users entering the database. Indeed, the effective num- 
ber of items rated by these users will be eventually low, and, consequently, 
their 5-proximity to x* will tend to remain high. On the other hand, for 
fixed i and n large enough, we know that M C M""*"^"* and xj""* = X*. 
This implies p^""^ = 1, S'(x*, X^-"'*) = 5(x*, X*) = cos(x*, X*) and shows that 
definition (2.4) generalizes definition (2.3). Therefore, we take the liberty to 
still call the estimate (2.4) the cosine-type fe„-NN regression estimate. 

Remark 2.1. A smoothed version of the similarity S could also be 
considered, typically, 

5(x^x^"))=^(p;"))s(x^xS")), 

where ^jJ : [0, 1] — )■ [0, 1] is a nondecreasing map satisfying -0(1/2) < 1 (assum- 
ing \M\ > 2). For example, the choice ip{p) = ^/p tends to promote users 
with a low number of rated items, provided the items corated by the new 
user are quite similar. In the present paper, we shall only consider the case 
ip{p) = p, but the whole analysis carries over without difficulties for general 
functions ^. 

Remark 2.2. Another popular approach to measure the closeness be- 
tween users is the Pearson correlation coefficient. The extension of our results 
to Pearson-type similarities is not straightforward and more work is needed 
to address this challenging question. We refer the reader to [7] and [12] for 
a comparative study and comments on the choice of the similarity. 

Finally, for definiteness of the estimate r/„(x*), some final remarks are in 
order: 

(i) If 5(x*,x5"^) = 5(x*,xf ^), i.e., Xf ^ and xj"^ are equidistant from 

(n) 

X*, then we have a tie, and, for example, X^ may be declared "closer" to 
X* if i < j; that is, tie-breaking is done by indices. 

(ii) If \TZn\ < kn, then the weights Wni(x*) are not defined. In this case, 
we conveniently set Ty„j(x*) = 0; that is, ??n(x*) = 0. 

(iii) If X^"^ = 0, then we take Wni(x*) = 0, and we adopt the convention 
X c>o = for the computation of r]ni^*)- 

(iv) With the above conventions, the identity X^jgT^^ Wnii^*) < 1 holds 
in each case. 
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3. The regression function. Our objective in Section 4 will be to estab- 
lish consistency of the estimate ry„(x*) defined in (2.4) toward the regression 
function rj{x*). To reach this goal, we first need to analyze the properties of 
r/(x*). Surprisingly, the special form of ??n(x*) constrains the shape of rj{x*). 
This is stated in Theorem 3.1 below. 



Theorem 3.1. Suppose that ?/„(X*) — )• ?/(X*) in probability as oo. 
Then 



r/(X*) = ||X*||E 



Y 



X* 



X* 



X* 



a.s. 



Proof. Recall that 



Y 



r?„(X*) = ||X*|| ty„,(X*)^ 



X 



and let 



if. 



Y, 



IX 



Since (?7„(X*))ri is a Cauchy sequence in probability and ||X*|| > 1, the se- 
quence {ipn(X*))n is also a Cauchy sequence. Thus there exists a measurable 
function ip on such that (/^^(X*) — ?• ^^(X*) in probability. Using the fact 
that < (/?„(X*) < s for all n > 1, we conclude that < v?(X*) < s a.s. as 
well. 

Let us extract a sequence {nk)k satisfying (^„^(X*) — )• (^(X*) a.s. Observ- 
ing that, for X* 7^ 0, 



ifn^ (X )=ip. 



■n-k 



X 



X' 



we may write <f(X*) = (^(X*/||X*||) a.s. Consequently, the limit in proba- 
bility of (r?„(X*))„ is 



\X*\\ip 



X* 



IX* 



Therefore, by the uniqueness of the limit, r](X*) = ||X*||99(X*/||X*||) a.s. 
Moreover, 



X* 



X* 



X* 



X* 



X* 



X* 



■?7(X* 
||X*| 



X* 
IX* 
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E 



Y 



X* 

IX* 



Y 
IX* 



X* 
IX* 



since o"(X*/||X*||) C cr(X*). This completes the proof of the theorem. □ 

An important consequence of Theorem 3.1 is that if we intend to prove 
any consistency result regarding the estimate ?7n(x*), then we have to assume 
that the regression function 77 (x*) has the special form 



7?(X*) = ||X*||(^(X* 



where (/'(x* 



:E 



Y 



X* 



X* 



X 



X* 



x" 



This will be our fundamental requirement throughout the paper, and it 
will be denoted by (F). In particular, if x* = Ax* with A > 0, then /^(x*) = 
A7y(x*). That is, if two ratings x* and x* are proportional, then so must be 
the values of the regression function at x* and x*, respectively. 

4. Consistency. In this section, we establish the Li consistency of the 
regression estimate rin{x*) toward the regression function ?/(x*). Using Li 
consistency is essentially a matter of taste, and all the subsequent results 
may be easily adapted to Lp norms without too much effort. In the proofs, 
we will make repeated use of the two following facts. Recall that, for a fixed 
i € TZn, the random variable X* = {X*^, . . . , X*^) is defined by 



X* 



Xij, 
0, 



if jGM, 
otherwise, 



and X.l"'' = X* as soon as M C M"^"^ Recah also that, by definition, 
|X*|| > 1. 



Fact 4.1. For each i £ TZr, 



5(X*,X*) = 5(X*,X*) =cos(X*,X*) = 1-2^' 
where d is the usual Euclidean distance on W^. 



X* 



X* 



X* 



X* 



Fact 4.2. Let, for all i > 1, 

Ti = mm{k > i : A/f +^-* D M) 

be the first time instant when user i has rated all the films indexed by M. 
Set 



(4.1) 



Cn = {i£'JZn-Ti< n} 
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and define, for i £ Cn, 

VV-ix*) = I ^'^ among tlie fe^-MS of x* in {X*,i G 

1 0, otlierwise. 

Then 

{X* X* f X* 

l/kn, if is among the A:„-NN of in j-p^'^ ^ 

0, otherwise, 

where the A;.„-NN are evaluated with respect to the Euclidean distance on 
R"^. That is, the W*^{'x*) are the usual Euclidean NN weights [9], indexed 
by the random set 

Recall that \TZn\ represents the number of users who have already provided 
information about the variable of interest (the movie Titanic in our example) 
at time n. We are now in a position to state the main result of this section. 

Theorem 4.1. Suppose that |M| > 2 and that assumption (F) is satis- 
fied. Suppose that kn — )• oo, \TZn\ oo a.s. and E[/c„/|7^„|] — )• as n ^ oo. 
Then 

E|r/n(X*) -r/(X*)| ^0 asn^oo. 

Thus, to achieve consistency, the number of nearest neighbors kn, over 
which one averages in order to estimate the regression function, should on 
one hand, tend to infinity but should, on the other hand, be small with 
respect to the cardinality of the subset of database users who have already 
rated the item of interest. We illustrate this result by working out two ex- 
amples. 

Example 4.1. Consider, to start with, the somewhat ideal situation 
where all users in the database have rated the item of interest. In this case, 
TZn = {1, . . . and the asymptotic conditions on kn become kn ^ oo and 
A;„/n— 7-0 as n— t-oo. These are just the well-known conditions ensuring con- 
sistency of the usual (i.e., Euclidean) NN regression estimate ([9], Chapter 
6). 

Example 4.2. In this more sophisticated model, we recursively define 
the sequence {TZn)n as follows. Fix, for simplicity, TZi = {1}. At step n > 2, 
we first decide (or not) to add one element to TZn-i with probability p £ 
(0, 1), independently of the data. If we decide to increase TZn, then we do it 
by picking a random variable Bn uniformly over the set {1, . . . ,n} — TZn-i, 
and set TZn = Tln-i U {Bn}; otherwise, TZn = TZn-i- Clearly, \R.n\ — 1 is a 
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sum of n — 1 independent Bernoulli random variables with parameter p, 
and it has therefore a binomial distribution with parameters n — 1 and p. 
Consequently, 

kr,[l-{l-pr] 

np 

In this setting, consistency holds provided kn — )• oo and /c„ = o(n) as n — )■ oo. 



E 



In the sequel, the letter C will denote a positive constant, the value of 
which may vary from line to line. Proof of Theorem 4.1 will strongly rely on 
Facts 4.1, 4.2 and the following proposition. 



Proposition 4.1. Suppose that \M\ > 2 and that assumption (F) is 
satisfied. Let = P(M"+i-* 7^ M\M). Then 



E|r/n(X*) - r7(X* 
<C<!E ^'^ 



+ E 



+ E 



+ E 



where TZn stands for the nonempty subset of users who have already provided 
information about the variable of interest at time n, and Cn is defined in 

(4.1). 



Proof. Since IIX^^II <sVd, it will be enough to upper bound the quan- 



tity 



E 



To this aim, we write 



E 



ien„ 



Y, 



X 



< E 



X 



Y 



X 



Y 



X 



where the symbol A'^ denotes the complement of the set A. Let the event 



An = [3i G :X5"^ is among the kn-MS of X* in {X["^^ G 7^„}]. 



(n) 
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Since X]ie£= Wni(X*) < 1, we have 



E 



IX 



X 



<sP(A). 



Observing that, for i G £„, xj"^ = X* and VF'„i(X*)l^c = W*i(X*)l^c (Fact 
4.2), we obtain 



E 



5^ Wni{y^^ 



Yi 



IX 



E 



:E 



Yi 



+ E 



< s^{An) + E 



X? 



V9(X* 

llx?|| 



L Ac 



Applying finally Lemma 6.5 completes the proof of the proposition. □ 
We are now in a position to prove Theorem 4.1. 

Proof of Theorem 4.1. According to Proposition 4.1, Lemma 6.1 
and Lemma 6.2, the result will be proven if we show that 



E 



Yi 



E^n.(X^)||X. 

For L„eP({l,...,n}), set 



^(X*) 



as n — )• oo. 



1 



"'TL 



Yi 



L{X*/||X*|| is among the fc„-NN of XVI|X*|| in {X*/||X* || ,ieL„}} 1 



X? 



Conditionally on the event [M = m], the random variables X* and {X*,i G 
Ln} are independent and identically distributed. Thus, applying Theorem 
6.1 in [9], we obtain 



Ve > 3Arr, > l:k„> Am and 



l-^nl 



> Ar 
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where we use the notation Em[-] = E[-|M = m]. Let Fm{-) = P(-|M = m). By 
independence, 

LneV{{l,...,n}) 

Consequently, letting A = max Am, where the maximum is taken over all 
possible choices of m G 'P*({1, . . . ,d}), we get, for all n such that kn > A, 

L„eVi{l,...,n}) 

\L/n I ^-A/ljt, 

L„er{{l,...,n}) 

\L/rL I 'CA^kn 

<e + sFm{\Cn\<Akn). 

Therefore, 

E|Z2J =E[E[\Z2J\M]]<e + sF{\Cn\<Akn). 
Moreover, by Lemma 6.2, 

^ ^ 1 — I " I ] — 7- oo in probability as n — ?• oo. 



kn kri \ |7^- 

Thus for all e > 0, limsup„_j.oQE|Z£^| < e, whence E\Z'1_J\ — )• as n — )• oo. 
This shows the desired result. □ 

5. Rates of convergence. In this section, we bound the rate of conver- 
gence of E|7/„(X*) — ?/(X*)| for the cosine-type /e^-NN regression estimate. 
To reach this objective, we will require that the function 



99(x*) = E 



Y 



X* 



X'^ X* 



satisfies a Lipschitz-type property with respect to the similarity S. More 
precisely, we say that ip is Lipschitz with respect to S if there exists a 
constant C > such that, for all x and x' in M"^, 



|¥.(x)-v^(x')|<C^l-S(x,x'). 

In particular, for x and x' G M'^ — with the same null components, this 
property can be rewritten as 



where we recall that d denotes Euclidean distance. 
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Theorem 5.1. Suppose that assumption (F) is satisfied and that ip is 
Lipschitz with respect to S. Let a„i = P(M"+i"* 7$ M\M), and assume that 
\M\ > 4. Then there exists C > such that, for all n > 1, 



" k 



<C<^E 



{Tin 



+ E 



n 



Or, 



+ E 



kn 



+ 



where Pn = 1/(|M| — 1) if kn < I'R-nl, CLnd Pn = 1 otherwise. 

To get an intuition on the meaning of Theorem 5.1, it helps to note that 
the terms depending on a„j do measure the influence of the unrated items 
on the performance of the estimate. Clearly, this performance improves as 
the cXni decrease, that is, as the proportion of rated items growths. On the 
other hand, the term E[(/c„/|7?.„|)'^"] can be interpreted as a bias term in 
dimension \M\ — 1, whereas l/-v/AJ^ represents a variance term. As usual in 
nonparametric estimation, the rate of convergence of the estimate is dramat- 
ically deteriorated as \M\ becomes large. However, in practice, this drawback 
may be circumvented by using preliminary dimension reduction steps, such 
as factorial methods (PCA, etc.) or inverse regression methods (SIR, etc.). 

Example 5.1 (Example 4.1, continued). Recall that we assume, in this 
ideal model, that TZn = {1, . . . , n}. Suppose in addition that M = {1, . . . , d}, 
that is, any new user in the database rates all products the first time he 
enters the database. Then the upper bound of Theorem 5.1 becomes 



E|ry„(X*)-r?(X*)|=0 



kr. 



n 



i/(d~i) 



+ 



1 



kr. 



Since neither TZn nor M are random in this model, we see that there is no 
influence of the dynamical rating process. Besides, we recognize the usual 
rate of convergence of the Euclidean NN regression estimate ([9], Chapter 
6) in dimension d — \. In particular, the choice kn ~ n'^/^'^^^^ leads to 



E|r/„(X^)-?7(X*)| = 0(n 



Note that we are led to a (d — l)-dimensional rate of convergence (instead 
of the usual d) just because everything happens as if the data is projected 
on the unit sphere of M'^. 



Example 5.2 (Example 4.2, continued). In addition to model 4.2, we 
suppose that at each time, a user entering the game reveals his preferences 
according to the following sequential procedure. At time 1, the user rates 
exactly 4 items by randomly guessing in {1, . . . , d}. At time 2, he updates his 
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preferences by adding exactly one rating among his unrated items, randomly 
chosen in { 1 , . . . , d} — . Similarly, at time 3, the user revises his preferences 
according to a new item uniformly selected in {1, . . . , d} — Mf, and so on. In 
such a scenario, \M^ \ = mm{d,j + 3) and thus, = {1, . . . , d} for j >d — 3. 
Moreover, since |M| = 4, a moment's thought shows that 

fO, ifi<n-(i + 4, 

1 - , ifn-d+5<i<n. 



\n+i-i) 



Assuming n>d — 5, we obtain 

n 

(n + 4 - i){n + 3 - i){n + 2-i){n + l-i] 



i=n—d+5 



<(d-4)(l 



d{d-l){d-2){d-3) 
24 



d{d-l){d-2){d-3)^ 
Similarly, letting TZno = Tin H {n — d + 5, . . . , n}, we have 

JJ am = JJ aml{mm(7^„)>r^-d+5} 



<(l- 



24 



d(d- l)(d-2)(d-3) 



l{min(7^„)>n-^^+5}• 



Since |7^n| — 1 has binomial distribution with parameters n — 1 and p, we 
obtain 



E 



<P(min(7^n) >n-d + 5) 



(7 

<F(|7^r^| <d-5) < -. 

n 



Finally, applying Jensen's inequality. 



E 



kn 



Pn 



E 



kn 



1/3 



'-{kn<\nn\} 



+ E 



'-{kn>\TZn\} 



< C E 



kn 











kn 

n 



1/3 



Putting all the pieces together, we get with Theorem 5.1 

1/3 



E|,„,X-)-,(X-)| = 0((^) 
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In particular, the choice A;„ ~ r?!'^ leads to 

E|7?„(X^)-7?(X^)| = 0(n-i/5), 

which is the usual NN regression estimate rate of convergence when the data 
is projected on the unit sphere of M'^. 

Proof of Theorem 5.1. Starting from Proposition 4.1, we just need 
to upper bound the quantity 



E 



Yi 



X7 



A combination of Lemma 6.6 and the proof of Theorem 6.2 in [9] shows that 



E 



(5.1) 



< c 



X? 



+ E 



We obtain 



E 



I. \ 1/(1^1-1) 



\C I 

\'~'n\ 

= E 



+ P(£„ = 0) \. 



|7^„|(l-|£^|/|7^„| 



L{|£-|<|7^„|/2} 



+ E 



l/(|Mhl) 



< E 



1/(|M|-1)- 



l{|£^J>|7e„|/2}l{£„^0} 



+ E[fey(l^^|-l)l{|^e|>|^„|/2}]. 



Since |M| > 4, one has 21/(1^^1^^) < 2 and /c^^'^^' < K in the rightmost 
term, so that, thanks to Lemma 6.2, 

- J, \l/(|A/hl) 



E 



< C<' E 



1/{|A/|-1)' 



+ E 



The theorem is a straightforward combination of Proposition 4.1, inequality 
(5.1) and Lemma 6.1. □ 
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6. Technical lemmas. Before stating some technical lemmas, we remind 
the reader that TZn stands for the nonempty subset of {!,..., n} of users 
who have already rated the variable of interest at time n. Recall also that, 
for all i > 1, 

Ti = min(/t > i : Mf +^-* D M) 

and 

Cn = {ieTZn:Ti< n}. 



Lemma 6.1. We have 

P(£„ = 0) = E 



n 



dr. 



as n — )• oo. 



Proof. Conditionally on M and 7^^, the random variables {Tj,i G TZn\ 
are independent. Moreover, the sequence (M").„>i is nondecreasing. Thus, 
the identity [Ti >n] = [Mf"^^"' 7^ M] holds for all i G 7^„. Hence, 

P(£„ = 0) = P(Vi eUn-.T^yn) 

= E[P(Vi G Tin : > n|7^„, M)] 



:E 



:E 



:E 



H P(^,>n|7^„,M) 



]J P(Mf+^-^^M|M) 



[by independence of (Mf+^-\M) and 71,,] 



n 



The last statement of the lemma is clear since, for all i, Oni — >• a.s. as 
n —7- oo. □ 



Lemma 6.2. We have 









1 


E 




= E 




.|7^„L 




.|7^„, 



and 



E 



T7r-rl{£„^0} 





1 




< 2E 


_\TZn\_ 


+ 2E 
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Moreover, i/ lim„^oo I'^nl = 00 a.s., then 



lim E 

n— >oo 



0. 



Proof. First, using the fact that the sequence {M"')n>i is nondecreas- 
ing, we see that for all i £ TZn 1 [Ti > n] = [Mf+i-* 75 M]. Next, recalling that 
TZn is independent of Tj for fixed i, we obtain 





' Ml" 1 






E 




Tin 










|7^„| 



^e7^„ 



and this proves the first statement of the lemma. Now define J7n = {?^ + 1 
i,i G Tin} and observe that 









E 




= E 









]- P(MJ' 75 M) 



I Sfn 



where we used \Jn\ = |'7^n|- Since, by assumption, \Jn\ = \'Tl-n\ 
n — )■ 00 and P(M-' M) — )■ as j — )■ 00, we obtain 



00 a.s. as 



hm V P(iVP 75 M) = 

n^oo I J'nl ^ 



a.s. 



The conclusion follows by applying Lebesgue's dominated convergence the- 
orem. The second statement of the lemma is obtained from the following 
chain of inequalities: 



E 



1 



\Cr. 



= E 
= E 

+ E 
< 2E 



1 



7^„|(l-|£^|/|7^„|) 
1 

1 



T7r-rl{|£-|>|7e„|/2}l{£„^0} 

I •'-'71 1 



1 



< 2E 



\nn\ 
1 



+ . 



+ 2E 



l;^ I 



|7^n| 



Applying the first part of the lemma completes the proof. □ 

Lemma 6.3. Denote by Z* and Z\ the random variables Z* = X*/||X*||, 
Z^ = X^/||X^||, and /eU(Z*) = P(S'(Z*,Z^) > 1/2|Z*). Then 



¥i2kn>\Cn\C{Z*)\Cn,M)<2E 









1 






r 


E 


M 


_|7^n| 
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\'-'n\ 



Proof. If M is fixed, Z* is independent of i2„ and TZn - Thus by Markov's 
inequality, 

p(2fcn>|/:niaz'^)l'Cn,M,7e„) 

= p(2fe„ > |7^„|e(z*) - |/:^|e(z*)|£„,M,7^„) 

= P(2fc„ + |£^|e(Z*) > |7^„|e(Z*)|£„,M,7^„) 



1 



e(z* 



M 



+ 



\'~'n\ 
|7^n| 



The proof is completed by observing that TZn and M are independent ran- 
dom variables. □ 



Let 5(x, e) be the closed Euclidean ball in centered at x of radius e. 
Recall that the support of a probability measure [i is defined as the closure 
of the collection of all x with fi{B{x,e)) > for all e > 0. The next lemma 
can be proved with a slight modification of the proof of Lemma 10.2 in [8]. 

Lemma 6.4. Let fi be a probability measure on M'^ with a compact sup- 
port. Then 



1 



-fi{dx) < C 



with C > a constant depending upon d and r only. 

Lemma 6.5. Suppose that \M\ > 2, and let the event 

An = [3i G :Xf ^ is among the kn-MS o/X* in {xj^^i G 7^„}]. 
Then 



+ E 



Proof. Recall that, for a fixed i £ TZn, the random variable X* 
. . . , X*^) is defined by 



13 1 



otherwise, 



and X,^"^ = X* as soon as M C M"+^-\ 
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We first prove the inclusion 

(6.1) An C G Cn : S(X*,Xp > 1/2} I < K]. 

Take i & such that X^^"^ is among the A:„-MS of X* in {x|'^\z S T^n}- 
Then, for all j € £„ such that 5(X*,X*) > 1/2, we have 

5(X^Xp > i > pS")5(X*,xS"^) = 5(X*,XJ")) 

since pj"^ < 1 - l/\M\ < 1/2 if |M| > 2. If 

|{iG/:„:5(X^Xp>l/2}|>A;„, 

then xj"^ is not among the A;„-MS of X* among the {X^^^z G T^n}- This 
contradicts the assumption on X."^ and proves inclusion (6.1). 

Next, define = X*/||X*|| , = X*/||X*||, i = l,...,n, and let ^(Z'^) = 
P(5(Z^Z^) > 1/2|Z*). If |£„|e(Z*) < -(1/2)|/:„|C(Z*) and £„ ^ 0, we 
deduce from (6.1) that 



F(A|£„,Z 

< 



l{5(Z*,Z*)>l/2} < 



-^rn Z"* 



E (l{s{z*,z*)>i/2} - e(Z*)) <kn- |£„|^(Z*)|£„, Z*") 



< 



< 



{s{z*,z*)>i/2} - C(Z*)) < --|£„|^(Z* 



Ed 

4|£„|g(Z^) _ 4 

(i/:„ie(z^))2 |£„ie(z* 



-^nj Z'' 



(by Chebyshev's inequality). 



In the last inequality, we use the fact that, since cr(M) C (t(Z*), the random 
variables {Z*,i e are independent conditionally on Z* and £„. Using 
again the inclusion (j{M) C cr(Z*), we obtain, on the event [Cn / 0], 

P(A,|£„,M) 

= E[P(A|/:„„Z*)|£„,M] 



< 



IZ" 

I '~"n 

4 



1 



e(z* 
1 



+p(fc„-i£je(z*)>--i£„ie(z* 



e(z-) 



M 



+ P(|£„|^(Z*)<2fc„|£„,M). 



-NEAREST NEIGHBOR COLLABORATIVE RECOMMENDATION 23 



Applying Lemma 6.3, on the event [£„ / 0], 



\An\Cn,M)< 



\'~"n\ 

+ E 



M 



+ 2E 



r 

'~"n 



E 



M 



\'~"n\ 



Moreover, by Fact 4.1, 



^(Z*) = F{S{Z\ Z{) > i|Z*) > P(d2(Z*, Z^) < i|Z*). 

Thus, denoting by u^^ the distribution of Z* conditionally to M, we deduce 
from Lemma 6.4 that 



E 



1 






M 









1 



(fi(z,l/x/2)) 



where the constant C does not depend on M. Putting all the pieces together, 
we obtain 





1 












|e 


1^ |1{£„^0} 
. 1 \ 


+ E 


_\nn\_ 


+ E 


.|7^„|. 


} 



< E 



We conclude the proof with Lemmas 6.1 and 6.2. □ 



+ F(£„ = 0) 



In the sequel, we let X^-,^^, . . . ,X*|^ |^ be the sequence {X*,i £ Cn} re- 
ordered according to decreasing similarities S{X* ,X.*),i £ Cn, that is, 

5(x^x^l))>•••>5(x^x^l^^l)). 

Lemma 6.6 below states the rate of convergence to 1 of 5'(X*,X*-^p. 

Lemma 6.6. Suppose that \M\ > 4. Then there exists C > such that, 
on the event [Cn / 0] , 



l-E[5(X^X^,))|M,£„]< 



Proof. Observe that 

E[l-5(x^x^^)lx^£„] 



c 



|£J2/(|A/|-1)- 



P(l - 5(X^ Xf,0 > e\X\Cn) de 

JO 

[ P(Vi G £„ : 1 - S{X*, X*) > e|X*,£„) de. 
Jo 
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Since a{M) C a(X*), given X* and £„, the random variables {X*,i € 
are independent and identically distributed. Hence, 

E[l - 5(X^X^,))|X^£„] = ^'[P(l - S{X*,Xl) > e\X*)f-\ de. 

Denote by v^^ the conditional distribution of X*/||X*|| given M. The sup- 
port of v^'^ is contained in both the unit sphere of and in a |M| -dimen- 
sional vector space. Thus, for simplicity, we shall consider that the support 
of v^-' is contained in the unit sphere of M'*^'. Let ySl^^'^l (x, r) be the closed 
Euclidean ball in rI^^I centered at x of radius r. Since X* (resp., X^) only 
depends on M and X (resp., Xi), then, given X*, the random variable 
Xj^/llX^II is distributed according to v^^ . Thus, for any e > 0, we may write 
(Fact 4.1) 

P(l - 5(X*,Xt) > e|X*) = 1 - v^'^ {b\^^\ ^ 
and, consequently. 



IX* 



E[l-5(x^x^l))|x^£„] = 

Using the inclusion a{M) C cj(X*), we obtain 
E[l-5(X^X^l))|M,£„] 



X* 



X* 



\Cn\ 



de. 



(6.2) 



E 



X* 



X* 



M,Cn 



de. 



Fix e > 0, and denote by S{M) the support of z^*^. There exists Euclidean 
balls Ai, . . . ,^Ar(e) in rI^^I with radius V2e/2 such that 



Nie) 

S{M) C U Aj and N{e) < 
i=i 



C 



r(lA/|-l)/2 



for some C > which may be chosen independently of M. Clearly, if x £ 
Aj n S{M), then Aj C el^^l(x, V2e). Thus 



E 



N{e) 



X* 



< 



j=i ■'^^ 



E 



||X*|| 



X* 



X* 



\Cn\ 
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Nie) 



< 



Nie) 



< N{e) max t(l 



te[o,i] 



£^|^(|M|-l)/2- 



Combining this inequality and equality (6.2), we obtain 



E[l - 5(X*,X^,))|M,£„] < ^\iiin(^l 




) 



de. 



Since |M| > 4, an easy calculation shows that there exists C > such that 



which leads to the desired result. □ 

Acknowledgments. The authors are greatly indebted to Albert Benve- 
niste for pointing out this problem. They also thank Kevin Bleakley and 
Toby Hocking for their careful reading of the paper, and two referees and 
the Associate Editor for valuable comments and insightful suggestions. 



[1] Abernethy, J., Bach, P., Evgeniou, T. and Vert, J. -P. (2009). A new approach 
to collaborative filtering: Operator estimation with spectral regularization. J. 
Mach. Learn. Res. 10 803-826. 

[2] Adomavicius, G., Sankaranarayanan, R., Sen, S. and Tuzhilin, A. (2005). 

Incorporating contextual information in recommender systems using a multidi- 
mensional approach. ACM Trans. Inform. Syst. 23 103-145. 

[3] Adomavicius, G. and Tuzhilin, A. (2005). Toward the next generation of recom- 
mender systems: A survey of the state-of-the-art and possible extensions. IEEE 
Trans. Knowl. Data Eng. 17 734-749. 

[4] Breese, J., Heckerman, D. and Kadie, C. (1998). Empirical analysis of predic- 
tive algorithms for collaborative filtering. In Proceedings of 14th Conference on 
Uncertainty m Artificial Intelligence 43-52. Morgan Kaufman, San Francisco, 
CA. 

[5] Candes, E. and Plan, Y. (2009). Matrix completion with noise. Submitted. Avail- 
able at http:/ /www.acm. caltech.edu/~emmanuel/papers/NoisyCompletion.pdf. 

[6] Candes, E. and Recht, B. (2009). Exact matrix completion via convex optimiza- 
tion. Found. Comput. Math. 9 717-772. 



E[l-5(X^XflO|M,£„]< 



c 



£„|2/(|M1-1) 



REFERENCES 



26 



G. BIAU, B. CADRE AND L. ROUVIERE 



[7] Choi, S., Kang, S. and Jeon, Y. (2006). Personalized recommendation system based 

on product specification values. Expert Systems with Applications 31 607-616. 
[8] Devroye, L., Gyorfi, L. and LuGOSi, G. (1996). A Probabilistic Theory of Pattern 

Recognition. Springer, New York. MR1383093 
[9] Gyorfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution Free 

Theory of Nonparametric Regression. Springer, Berlin. MR1920390 
[10] Heckerman, D., Chickering, D., Meek, C, Rounthwaite, R. and Kadie, C. 

(2000). Dependency networks for density estimation, collaborative filtering, and 

data visualization. J. Mach. Learn. Res. 1 49-75. 
[11] Hill, W., Stead, L., Rosenstein, M. and Furnas, G. (1995). Recommending 

and evaluating choices in a virtual community of use. In Proceedings of ACM 

CHPQS Conference on Human Factors m Computing Systems 194-201. ACM 

Press, New York. 

[12] MONTANER, M., Lopez, B. and Rosa, J. (2003). A taxonomy of recommender agents 
on the Internet. Artificial Intelligence Review 19 285-330. 

[13] Resnick, p., Iakovou, N., Sushak, M., Bergstrom, P. and Riedl, J. (1994). 

Grouplens: An open architecture for collaborative filtering of netnews. In Pro- 
ceedings of the 1994 Computer Supported Cooperative Work Conference 175-186. 
ACM Press, New York. 

[14] Salakhutdinov, R., Mnih, a. and Hinton, G. (2007). Restricted Boltzmann ma- 
chines for collaborative filtering. In Proceedings of the 24th International Con- 
ference on Machine Learning 791-798. ACM Press, New York. 

[15] Sarwar, B., Karypis, G., Konstan, J. and Riedl, J. (2001). Item-based collabo- 
rative filtering recommendation algorithms. In Proceedings of the 10th Interna- 
tional WWW Conference 285-295. ACM Press, New York. 

[16] Shardanand, U. and Maes, P. (1995). Social information filtering: Algorithms for 
automating "Word of mouth." In Proceedings of the Conference on Human Fac- 
tors m Computing Systems 210-217. ACM Press, New York. 

G. BiAu B. Cadre 

LSTA AND LPMA IRMAR, ENS Cachan Bretagne, CNRS, UEB 

Universite Paris VI Campus de Ker Lann 

BoiTE 158, 175 RUE Du Chevaleret Avenue Robert Sohuman 
75013 Paris 35170 Bruz 

France France 

E-MAIL: gcrard.biau@upmc.fr E-MAIL: Bcnoit.cadre@bretagne.cns-cachan.fr 

URL: http:/ /www. lsta.upmc.fr/biau. html URL: http:/ /w3. bretagne. ens-cachan.fr/math/pcoplc/bcnoit. cadre 
L. RouviERE 

CREST-ENSAI, IRMAR, UEB 

Campus de Ker Lann 

Rue Blaise Pascal, BP 37203 

35172 Bruz Cedex 

France 

E-MAIL : laurent .rouviere@ensai . fr 

URL: http:/ /www. ensai.com/laurent-rouviere-rub, 78. html 



